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O ' ABSTRACT 

O ' 

In complex systems with many degrees of freedom such as peptides and proteins there 
exist a huge number of local-minimum-energy states. Conventional simulations in the 
canonical ensemble are of little use, because they tend to get trapped in states of these 
energy local minima. A simulation in generalized ensemble performs a random walk 
in potential energy space and can overcome this difficulty From only one simulation 
run, one can obtain canonical-ensemble averages of physical quantities as functions of 
temperature by the single-histogram and/or multiple-histogram reweighting techniques. 
In this article we review uses of the generalized-ensemble algorithms. Three well-known 
methods, multicanonical algorithm, simulated tempering, and replica-exchange method, 
are described first. Both Monte Carlo and molecular dynamics versions of the algorithms 
are given. We then present three new generalized-ensemble algorithms which combine the 
merits of the above methods. The effectiveness of the methods for molecular simulations 
in the protein folding problem is tested with short peptide systems. 
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1 INTRODUCTION 

Despite the great advancement of computer technology in the past decades, simulations 
of complex systems such as spin glasses and biopolymers are still greatly hampered by the 
multiple-minima problem. It is very difficult to obtain accurate canonical distributions 
at low temperatures by conventional Monte Carlo (MC) and molecular dynamics (MD) 
methods. This is because simulations at low temperatures tend to get trapped in one of 
huge number of local-minimum-energy states. The results thus will depend strongly on 
the initial conditions. One way to overcome this multiple-minima problem is to perform 
a simulation in a generalized ensemble where each state is weighted by a non-Boltzmann 
probability weight factor so that a random walk in potential energy space may be realized. 
The random walk allows the simulation to escape from any energy barrier and to sample 
much wider phase space than by conventional methods. Monitoring the energy in a 
single simulation run, one can obtain not only the global-minimum-energy state but also 
canonical ensemble averages as functions of temperature by the single-histogram [|I|] and/or 
multiple-histogram 0, |3| reweighting techniques (an extension of the multiple-histogram 
method is referred to as Weighted Histogram Analysis Method [H). 

One of the most well-known generalized-ensemble methods is perhaps multicanonical 
algorithm (MUCA) 0, |5j (for a recent review, see Ref. ||). (The method is also referred to 
as entropic sampling [7| and adaptive umbrella sampling || . The mathematical equivalence 
of multicanonical algorithm and entropic sampling has been given in Ref. 0.) MUCA 
and its generalizations have been applied to spin glass systems (see, e.g., Refs. 



MUCA was also introduced to the molecular simulation field |14| (for previous reviews of 
generalized-ensemble approach in the protein folding problem, see, e.g., Refs. [P^ -[P^]). 
Since then MUCA has been extensively used in many applications in protein and related 
systems |18||— 1|44[|. Molecular dynamics version of MUCA has also been developed 



see also Refs. [^, |26| for Langevin dynamics version). Moreover, multidimensional (or 



multicomponent) extensions of MUCA can be found in Refs. p5 , [29| , [33 



While a simulation in multicanonical ensemble performs a free ID random walk in 
energy space, that in simulated tempering (ST) |47j (the method is also referred to as 
the method of expanded ensemble |36|]) performs a free random walk in temperature space 



(for a review, see, e.g., Ref. f48|). This random walk, in turn, induces a random walk 
in potential energy space and allows the simulation to escape from states of energy local 



minima. ST has also been applied to protein folding problem [[IS 

A third generalized-ensemble algorithm that is related to MUCA is 1/k-sampling 
A simulation in 1/k-sampling performs a free random walk in entropy space, which, in 
turn, induces a random walk in potential energy space. The relation among the above 
three generalized-ensemble algorithms was discussed and the effectiveness of the three 
methods in protein folding problem was compared f51 |. 



The generalized-ensemble method is powerful, but in the above three methods the 
probability weight factors are not a priori known and have to be determined by iterations 
of short trial simulations. This process can be non-trivial and very tedius for complex 
systems with many local-minimum-energy states. Therefore, there have been attempts to 
accelerate the convergence of the iterative process for MUCA |TI| [53], 5SJ, |£J (see 
also Ref. §). 

A new generalized-ensemble algorithm that is based on the weight factor of Tsallis sta- 
tistical mechanics pTl was recently developed with the hope of overcoming this difficulty 
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and the method was applied to a peptide folding problem [60. 31 

See also Ref. 



A similar but 

slightly different formulation is given in Ref. p2| . See also Ref. for a combination of 
Tsallis statistics with simulated tempering. (Optimization problems were also addressed 
by simulated annealing algorithms |)4] based on the Tsallis weight in Refs. H65|-fl67|. 
For reviews of molecular simulations based on Tsallis statistics, see, e.g., Refs. [|68|j-[|70|j.) 
In this generalized ensemble the weight factor is known, once the value of the global- 
minimum energy is given p8[ . The advantage of this ensemble is that it greatly simplifies 



the determination of the weight factor. However, the estimation of the global-minimum 
energy can still be very difficult. 

In the replica- exchange method (REM) 
termination is greatly alleviated, 
lier in Ref. 



n]]-[[73|, the difficulty of weight factor de- 
(A similar method was independently developed ear- 
REM is also referred to as replica Monte Carlo method |7^| , multiple 
Markov chain method |ff5 |, and parallel tempering |48| .) In this method, a number of 
non-interacting copies of the original system (or replicas) at different temperatures are 
simulated independently and simultaneously by the conventional MC or MD method. Ev- 
ery few steps, pairs of replicas are exchanged with a specified transition probability. The 
weight factor is just the product of Boltzmann factors, and so it is essentially known. 

REM has already been used in many applications in protein systems [[TB], |77|, [7^, 
79, |81| . Systems of Lennard- Jones particles have also been studied by this method in 
various ensembles P2[-||85||. Moreover, REM was applied to cluster studies in quantum 
chemistry field |86| . The details of molecular dynamics algorithm have been worked out 
for REM |78| (see also Refs. |76 , jB^Q. We then developed a multidimensional REM which 

see also Refs. 



88, 82, p) 



is particularly useful in free energy calculations 

However, REM also has a computational difficulty: As the number of degrees of 
freedom of the system increases, the required number of replicas also greatly increases, 
whereas only a single replica is simulated in MUCA or ST. This demands a lot of computer 
power for complex systems. Our solution to this problem is: Use REM for the weight 
factor determinations of MUCA or ST, which is much simpler than previous iterative 
methods of weight determinations, and then perform a long MUCA or ST production 
run. The first example is the replica- exchange multicanonical algorithm (REMUCA) f90j . 
In REMUCA, a short replica-exchange simulation is performed, and the multicanonical 
weight factor is determined by the multiple-histogram reweighting techniques [0, [|. An- 
other example of such a combination is the replica- exchange simulated tempering (REST) 
9T|. In REST, a short replica-exchange simulation is performed, and the simulated tem- 



pering weight factor is determined by the multiple-histogram reweighting techniques || |3| . 

We have introduced a further extension of REMUCA, which we refer to as multi- 
canonical replica- exchange method (MUCAREM) |K|. In MUCAREM, the multicanonical 
weight factor is first determined as in REMUCA, and then a replica-exchange multicanon- 
ical production simulation is performed with a small number of replicas. 

In this article, we describe the six generalized-ensemble algorithms mentioned above. 
Namely, we first review three familiar methods: MUCA, ST, and REM. We then present 
the three new algorithms: REMUCA, REST, and MUCAREM. The effectiveness of these 
methods is tested with short peptide systems. 



2 GENERALIZED-ENSEMBLE ALGORITHMS 
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2.1 Multicanonical Algorithm and Simulated Tempering 

Let us consider a system of N atoms of mass rrik (k = 1, ■ • • , N) with their coordinate 
vectors and momentum vectors denoted by q = {q 1: ■ ■ ■ , q N } and p = {p ly ■ ■ ■ ,p N }, 
respectively. The Hamiltonian H(q,p) of the system is the sum of the kinetic energy 
K(p) and the potential energy E(q): 

H(q,p)=K(p) + E(q) , (1) 

where 

N 2 

Kip) = E IT- ■ (2) 

In the canonical ensemble at temperature T each state x = (q,p) with the Hamiltonian 
H(q,p) is weighted by the Boltzmann factor: 

W B (x;T) = e - pH to*> , (3) 

where the inverse temperature (3 is defined by (3 = l/k^T (k-Q is the Boltzmann constant). 
The average kinetic energy at temperature T is then given by 

^))r={glt) T =\^T. (4) 

Because the coordinates q and momenta p are decoupled in Eq. ([!]), we can suppress 
the kinetic energy part and can write the Boltzmann factor as 

W B (x;T)=W B (E;T) = e- f>E . (5) 

The canonical probability distribution of potential energy P B (E;T) is then given by the 
product of the density of states n(E) and the Boltzmann weight factor Wb(E;T): 

P B (E;T)(xn(E)W B (E;T) . (6) 

Since n(E) is a rapidly increasing function and the Boltzmann factor decreases expo- 
nentially, the canonical ensemble yields a bell-shaped distribution which has a maximum 
around the average energy at temperature T. The conventional MC or MD simulations at 
constant temperature are expected to yield P B (E;T), but, in practice, it is very difficult 
to obtain accurate canonical distributions of complex systems at low temperatures by 
conventional simulation methods. This is because simulations at low temperatures tend 
to get trapped in one or a few of local-minimum-energy states. 

In the multicanonical ensemble (MUCA) H, U, on the other hand, each state is 
weighted by a non-Boltzmann weight factor W mn (E) (which we refer to as the multi- 
canonical weight factor) so that a uniform energy distribution P mn (E) is obtained: 

P mu (E) oc n(E)W mu (E) = constant. (7) 

The flat distribution implies that a free random walk in the potential energy space is real- 
ized in this ensemble. This allows the simulation to escape from any local minimum-energy 
states and to sample the configurational space much more widely than the conventional 
canonical MC or MD methods. 
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From the definition in Eq. (0) the multicanonical weight factor is inversely proportional 
to the density of states, and we can write it as follows: 



W mu (E) = e -A^ mu (£;T ) 



n(E) 



where we have chosen an arbitrary reference temperature, T = 1//cbA), an d the "multi- 
canonical potential energy" is defined by 

E mu (E; T ) = k B T \nn(E) = T S(E) . (9) 

Here, S(E) is the entropy in the microcanonical ensemble. Since the density of states of 
the system is usually unknown, the multicanonical weight factor has to be determined 
numerically by iterations of short preliminary runs [|], [5] as described in detail below. 

A multicanonical Monte Carlo simulation is performed, for instance, with the usual 
Metropolis criterion ||92|| : The transition probability of state x with potential energy E to 
state x' with potential energy E' is given by 

w(x x ') = { 1 > for AE ™ ^ > no) 

1 ' \ exp (-f3 AE mu ) , for AE mu > , [W) 

where 

AE mu = E mu (E'; T ) — E mu (E; T ) . (11) 

The molecular dynamics algorithm in multicanonical ensemble also naturally follows from 
Eq. (D), in which the regular constant temperature molecular dynamics simulation (with 
T = Tq) is performed by solving the following modified Newton equation: [EBl ^7] 



dE mn (E;T ) dE mu (E; T ) 
Pk ~ ^ " dE fk ' (12) 

where f k is the usual force acting on the k-th atom (k = 1, • • • , N). From Eq. ([]) this 
equation can be rewritten as 

Pk = yTeS ^ k ' 

where the following thermodynamic relation gives the definition of the "effective temper- 
ature" T(E): 

= , (14) 

E=E a 1 



dS(E) 



dE 
with 



E a = <E> T[Ea) . (15) 

The multicanonical weight factor is usually determined by iterations of short trial 
simulations. The details of this process are described, for instance, in Refs. [It], ^Jj. For 
the first run, a canonical simulation at a sufficiently high temperature To is performed, 
i.e., we set 

/ T ) = E, 

\ WW(E;T ) = W B (E;T ) = exp(-(3 E) . {W) 
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We define the maximum energy value E max under which we want to have a flat energy 
distribution by the average potential energy at temperature T : 

Eraax =< E >T ■ (17) 

Above -E max we have the canonical distribution at T = To. In the £-th iteration a simula- 
tion with the weight W^l(E; T ) = exp (—foE^^E; T )) is performed, and the histogram 

(E) of the potential energy distribution P^l(E) is obtained. Let E^ m be the lowest- 
energy value that was obtained throughout the preceding iterations including the present 
simulation. The multicanonical weight factor for the {£ + l)-th iteration is then given by 



E^ l \E-T 



E , for E > £ max , 

EM(E; To) + k B T \nN^(E) - c« , for E^ D < E < E, 

dE^\E-T 



max > 



dE 



,(() 



E - ESI) + E^(Ei2 n ; To), for E < E%L, 



(18) 

where the constant is introduced to ensure the continuity at E = E max and we have 

c® = k B T ]nNW(E max ) . (19) 

We iterate this process until the obtained energy distribution becomes reasonably flat, 
say, of the same order of magnitude, for E < E max . When the convergence is reached, we 
should have that E^ m is equal to the global-minimum potential energy value. 

It is also common especially when working in MD algorithm to use polynomials and 
other smooth functions to fit the histograms during the iterations [^, |27|, We have 
shown that the cubic spline functions work well [90|. 

However, the iterative process can be non-trivial and very tedius for complex systems, 
and there have been attempts to accelerate the convergence of the iterative process [II], 
25[ H, H, |5|, |. 

After the optimal multicanonical weight factor is determined, one performs a long 
multicanonical simulation once. By monitoring the potential energy throughout the sim- 
ulation, one can find the global-minimum-energy state. Moreover, by using the obtained 
histogram N mn (E) of the potential energy distribution P mu (E) the expectation value of a 
physical quantity A at any temperature T = I/U-qP is calculated from 

]T A(E) n(E) e- 0E 
<A> ^ E Y.n(E)e^ • (20) 

E 

where the best estimate of the density of states is given by the single-histogram reweighting 
techniques (see Eq. (|7|)) [0: 

"<^W (21 » 

In the numerical work, we want to avoid round-off errors (and overflows and underflows) 
as much as possible. It is usually better to combine exponentials as follows (see Eq. (|8|)): 

J2 A{E) N mn (E) e PoE m «(E:T )-PE 

< A >T= E ^ NUE) e p oElME , To) ^E ■ ( 22 ) 
E 
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We now briefly review the original simulated tempering (ST) method [46, 47]. In this 
method temperature itself becomes a dynamical variable, and both the configuration and 
the temperature are updated during the simulation with a weight: 

W ST {E;T) = e - f3E+a ^ , (23) 

where the function a(T) is chosen so that the probability distribution of temperature is 
flat: 

P ST (T) = J dE n(E) W ST (E;T) = J dE n(E) e^ E+a( - T) = constant . (24) 

Hence, in simulated tempering the temperature is sampled uniformly. A free random walk 
in temperature space is realized, which in turn induces a random walk in potential energy 
space and allows the simulation to escape from states of energy local minima. 

In the numerical work we discretize the temperature in M different values, T m (m = 
1, • ■ ■ , M). Without loss of generality we can order the temperature so that T\ < T 2 < 
• ■ • < Tm- The lowest temperature T\ should be sufficiently low so that the simulation 
can explore the global-minimum-energy region, and the highest temperature Tm should 
be sufficiently high so that no trapping in an energy-local-minimum state occurs. The 
probability weight factor in Eq. (p3 ) is now written as 



W S r(E;T m ) = e ~ PmE+am , (25) 

where a m = a(T m ) (m = 1, ■ • • , M). The parameters a m are not known a priori and have 
to be determined by iterations of short simulations. This process can be non-trivial and 
very difficult for complex systems. Note that from Eqs. fl2"4|) and we have 

e~ am oc 



J dE n(E) e~ PmE . (26) 



The parameters a m are therefore "dimensionless" Helmholtz free energy at temperature 
T m (i.e., the inverse temperature (3 m multiplied by the Helmholtz free energy). 

Once the parameters determined and the initial configuration and the initial 

temperature T m are chosen, a simulated tempering simulation is then realized by alter- 



nately performing the following two steps |46|, 17] : 



1. A canonical MC or MD simulation at the fixed temperature T m is carried out for a 
certain MC or MD steps. 

2. The temperature T m is updated to the neighboring values T m±1 with the configura- 
tion fixed. The transition probability of this temperature-updating process is given 
by the Metropolis criterion (see Eq. (|25|)): 



w(T rn -> T m±l ) = I ^ ( _ A) for A > | (27) 

where 

A = (A„±i — (3 m ) E — (a m ±i — a m ) . (28) 

Note that in Step 2 we exchange only pairs of neighboring temperatures in order to secure 
sufficiently large acceptance ratio of temperature updates. 
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As in multicanonical algorithm, the simulated tempering parameters a m = a(T m 
(m = 1,---,M) are also determined by iterations of short trial simulations (see, e.g. 



Refs. |48], [49], [51f for details). Here, we give the one in Ref. |5T . 

During the trial simulations we keep track of the temperature distribution as a his- 
togram N m = N(T m ) (m = 1, • • ■ , M). 

1. Start with a short canonical simulation (i.e., a m = 0) updating only configurations 
at temperature T m = Tm (we initially set the temperature label m to M) and 
calculate the average potential energy < E >t m - Here, the histogram N n will have 
non-zero entry only for n = m = M. 

2. Calculate new parameters a n according to 

a n — In N n , for m < n < M , 

a n - < E > Tm (/5 m _i - p rn ) , for n = m - 1 , (29) 
— oo , for n < m — 1 . 

This weight implies that the temperature will range between T m _i and Tm- 

3. Start a new simulation, now updating both configurations and temperatures, with 
weight WsT{E;T n ) = e~ t3nE+a " and sample the distribution of temperatures T n in 
the histogram N n = N(T n ). For T = T m _i calculate the average potential energy 
< E > Tm _ 1 . 

4. If the histogram N n is approximately flat in the temperature range T m _i <T n < T M , 
set m — m — 1. Otherwise, leave m unchanged. 

5. Iterate the last three steps until the obtained temperature distribution N n becomes 
flat over the whole temperature range [T\,Tm]- 

After the optimal simulated tempering weight factor is determined, one performs a long 
simulated tempering run once. From the results of this production run, one can obtain 
the canonical ensemble average of a physical quantity A as a function of temperature 
from Eq. (p0|), where the density of states is given by the multiple-histogram reweighting 
techniques 0, |3| as follows. Let N m (E) and n m be respectively the potential-energy 
histogram and the total number of samples obtained at temperature T m = 1/ &bAd (tu = 
1, • ■ ■ , M). The best estimate of the density of states is then given by [|]. [| 

M 

£ g' 1 N m (E) 
<E) = , (30) 



Tl m 6 



Jm — Pm.E 



m=l 

where 



e 



-f m = J- n {E) e-^ E . (31) 

E 

Here, g m = 1 + 2r m , and r m is the integrated autocorrelation time at temperature T m . 
Note that Eqs. © and (|l|) are solved self-consistently by iteration [^, ||| to obtain the 



dimensionless Helmholtz free energy f m (and the density of states n(E)). We remark that 



S 



in the numeraical work, it is often more stable to use the following equations instead of 
Eqs. and (^TJ): 



M 



£ g- 1 N m (E) 

P B (E; T) = n(E)e^ E = , (32) 



Tim, & 



fm-(P m -P)E 



771=1 

where 



e 



~ fm =E PB(E;T m ) . (33) 

E 

The equations are solved iteratively as follows. We can set all the f m (m = 1, • • ■ , M) to, 
e.g., zero initially. We then use Eq. (|32| ) to obtain Pb(E; T m ) (m = 1, ■ • • , M), which are 
substituted into Eq. ( |3~3"D to obtain next values of f m , and so on. 



2.2 Replica- Exchange Method 



The replica- exchange method (REM) |71||-|7^| was developed as an extension of simulated 



tempering |7lJ (thus it is also referred to as parallel tempering [Q) (see, e.g., Ref. |7£ 
for a detailed description of the algorithm). The system for REM consists of M non- 
interacting copies (or, replicas) of the original system in the canonical ensemble at M 
different temperatures T m (m = 1,---,M). We arrange the replicas so that there is 
always exactly one replica at each temperature. Then there is a one-to-one correspondence 
between replicas and temperatures; the label i {i — 1, • • • , M) for replicas is a permutation 
of the label m (m — 1, ■ ■ ■ , M) for temperatures, and vice versa: 

| i = i{m) = f{m) , 

where f(m) is a permutation function of m and / _1 («) is its inverse. 

Let X = ja?! , • • ■ , ^m M ^} = { x m(i)5 ' ' ' 5 x rn(M)} stand for a "state" in this general- 
ized ensemble. The state X is specified by the M sets of coordinates and momenta 
pW of N atoms in replica i at temperature T m : 



• (35) 



Because the replicas are non-interacting, the weight factor for the state X in this 
generalized ensemble is given by the product of Boltzmann factors for each replica (or at 
each temperature): 



m \ ( m 



^rem(X) = exp -£/? mW tf (V l] ,/ ] ) = exp - ]T (3 m H (V'Ml^Ml) , (36) 



7 = 1 ) \ 777=1 



where i(m) and m(i) are the permutation functions in Eq. (0). 

We now consider exchanging a pair of replicas in the generalized ensemble. Suppose 
we exchange replicas i and j which are at temperatures T m and T n , respectively: 

X — j- ■ ■ , x®, ■ ■ ■ , x]i\ ■ ■ -j > X = |- • ■ , x\£ , ■ ■ • , x$ , • ■ -j . (37) 
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Here, i, j, m, and n are related by the permutation functions in Eq. fl3~4|), and the exchange 
of replicas introduces a new permutation function /': 



i 
J 



f(™>) 
/(«) 



► j = f'( m ) 
i = f(n) . 



(3* 



The exchange of replicas can be written in more detail as 



a$ = (g [,1 ,p w ) 

3$ = [qW,pW 



(39) 



where the definitions for pM' and p^'l' will be given below. We remark that this process is 
equivalent to exchanging a pair of temperatures T m and T n for the corresponding replicas 
i and j as follows: 



3$ = (gW,p^) 

V / ? 



(40) 



In the original implementation of the replica- exchange method (REM) |71|-||74|| , Monte 
Carlo algorithm was used, and only the coordinates q (and the potential energy function 
E(q)) had to be taken into account. In molecular dynamics algorithm, on the other 
hand, we also have to deal with the momenta p. We proposed the following momentum 
assignment in Eq. (|39|) (and in Eq. (fE|)) [f78fl : 



P 1 ' 




P 



\j}> 



P 1 ' 



T 




(41) 



which we believe is the simplest and the most natural. This assignment means that we 
just rescale uniformly the velocities of all the atoms in the replicas by the square root of 
the ratio of the two temperatures so that the temperature condition in Eq. (^) may be 
satisfied. 

In order for this exchange process to converge towards an equilibrium distribution, 
it is sufficient to impose the detailed balance condition on the transition probability 

W REM (X) w{X X') = W REM (X') w{X' X) . (42) 
From Eqs. (0), (@), ©, ©, and (g2|), we have 

w(X -> X') 



w(X' -> X) 



ex V {-P m [K(pW)+E(q 

E(q\ 



+Pm K(p 

T, 



+ /3 n \K(pW 



E(f)]} , 



exp {-pJ^K (pW) - p n ^K + (3 m K (pW) + /3 n *T 



-/5 m [E (gW) - £ (V 
exp(-A) , 



/3n # 9 



fl(gM)]} 



(43) 
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where 



and i, j, m, and n are related by the permutation functions (in Eq. 
exchange: 

/ i = f(m) , 
I j = f(n) • 



This can be satisfied, for instance, by the usual Metropolis criterion 



w(X -> X') 



w 



X 



1 , for A < 

exp (-A) , for A > 



(44) 

j])) before the 
(45) 



(46) 



where in the second expression (i.e., w(a;W|a;^)) we explicitly wrote the pair of replicas 
(and temperatures) to be exchanged. Note that this is exactly the same criterion that 
was originally derived for Monte Carlo algorithm |TT|-||74|. 

Without loss of generality we can again assume T\ < T 2 < • • • < Tm- A simulation of 
the replica- exchange method (REM) |7T|-[74] is then realized by alternately performing 
the following two steps: 

1. Each replica in canonical ensemble of the fixed temperature is simulated simultaneously 
and independently for a certain MC or MD steps. 

2. A pair of replicas at neighboring temperatures, say x^] and x^ +1 , are exchanged 



with the probability w {x\ 



x 



bl 

m+l 



in Eq. 



Note that in Step 2 we exchange only pairs of replicas corresponding to neighboring 
temperatures, because the acceptance ratio of the exchange decreases exponentially with 
the difference of the two /3's (see Eqs. ([0]) and (|46|)). Note also that whenever a replica 
exchange is accepted in Step 2, the permutation functions in Eq. (|34]) are updated. 

The REM simulation is particularly suitable for parallel computers. Because one can 
minimize the amount of information exchanged among nodes, it is best to assign each 
replica to each node (exchanging pairs of temperature values among nodes is much faster 
than exchanging coordinates and momenta). This means that we keep track of the per- 
mutation function m(i; t) = f~ l (i; t) in Eq. fl34|) as a function of MC or MD step t during 
the simulation. After parallel canonical MC or MD simulations for a certain steps (Step 
1), M/2 pairs of replicas corresponding to neighboring temperatures are simulateneously 
exchanged (Step 2), and the pairing is alternated between the two possible choices, i.e., 
(T l5 T 2 ), (T 3 ,T 4 ), •••and (T 2 ,T 3 ), (T 4 ,T 5 ), ••, 

The major advantage of REM over other generalized-ensemble methods such as multi- 
canonical algorithm [|], |5| and simulated tempering |4(|, ^] lies in the fact that the weight 
factor is a priori known (see Eq. (0)), while in the latter algorithms the determination of 
the weight factors can be very tedius and time-consuming. A random walk in "tempera- 
ture space" is realized for each replica, which in turn induces a random walk in potential 
energy space. This alleviates the problem of getting trapped in states of energy local 
minima. In REM, however, the number of required replicas increases as the system size 
N increases (according to VN) |7l| . This demands a lot of computer power for complex 
systems. 
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Replica- Exchange Multicanonical Algorithm and Replica- 
Exchange Simulated Tempering 



2.3 



The replica- exchange multicanonical algorithm (REMUCA) |K| overcomes both the dif- 
ficulties of MUCA (the multicanonical weight factor determination is non-trivial) and 
REM (a lot of replicas, or computation time, is required). In REMUCA we first perform 
a short REM simulation (with M replicas) to determine the multicanonical weight factor 
and then perform with this weight factor a regular multicanonical simulation with high 
statistics. The first step is accomplished by the multiple-histogram reweighting techniques 
0, [|. Let N m (E) and n m be respectively the potential-energy histogram and the total 
number of samples obtained at temperature T m = l/k&(3 m of the REM run. The density 
of states n(E) is then given by solving Eqs. ([3(]) and fl3T|) self-consistently by iteration 



Once the estimate of the density of states is obtained, the multicanonical weight factor 
can be directly determined from Eq. (|8]) (see also Eq. @). Actually, the multicanonical 
potential energy, E mu (E;To), thus determined is only reliable in the following range: 

Ei < E < Em , (47) 



(4£ 



where 

f Ek = <E> Tl 
\ E M = < E > Tm , 

and T\ and Tm are respectively the lowest and the highest temperatures used in the REM 
run. Outside this range we extrapolate the multicanonical potential energy linearly: 

' dE mu (E; T ) 



4°u } (£) 




(E — E x ) + E mn (Ei, T 



(E — E M ) + E mu (E M ; T 



for E < E x , 
for Ex<E< E M , 
for E > E M - 



(49) 



The multicanonical MC and MD runs are then performed with the Metropolis criterion 
of Eq. ([II]) and with the Newton equation in Eq. QI2D, respectively, in which £^(E) 
in Eq. (|49"D is substituted into i£ mu (i£; T ). We expect to obtain a flat potential energy 
distribution in the range of Eq. fl4"7|) . Finally, the results are analyzed by the single- 
histogram reweighting techniques as described in Eq. (|21~D (and Eq. (|20|) ) . 

Some remarks are now in order. From Eqs. (§), fli~4]) , (|15|), and (£H), Eq. (|49D becomes 



£l u } (£) 



(E - Ex) 



Tx 

T S(E) , 

—(E - 
Tm 



T S(Ex 



Tx 



E + constant 



E 



M> 



T S(E 



T n 



T 



E + constant 



for E < Ex=< E > Tl , 

for Ex<E< E M , 

for E > E M =< E > Tm . 

(50) 



The Newton equation in Eq. 



Pk 



12J) is then written as (see Eqs. (eqn9b), (|14D , and (p!q)) 
To 
T 



f k , for E < Ex, 



Jo 



T(E) 

n 

Ti J k 
M 



f k , for Ex < E < E 



(51) 



foi E> E 



M- 
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Because only the product of inverse temperature (3 and potential energy E enters in the 
Boltzmann factor (see Eq. (0)), a rescaling of the potential energy (or force) by a constant, 
say a, can be considered as the rescaling of the temperature by a" 1 jB^] . Hence, our 
choice of E^(E) in Eq. © results in a canonical simulation at T = 7\ for E < Ei, a 
multicanonical simulation for E\ < E < Em, and a canonical simulation at T = Tm for 
E > Em- Note also that the above arguments are independent of the value of T , and we 
will get the same results, regardless of its value. 

Finally, although we did not find any difficulty in the case of protein systems that we 
studied, a single REM run in general may not be able to give an accurate estimate of the 
density of states (like in the case of a first-order phase transition J7TJ). In such a case we 
can still greatly simplify the process of the multicanonical weight factor determination by 



combining the present method with the previous iterative methods [10, 21, 25, 54, pS 



We finally present the new method which we refer to as the replica- exchange simulated 
tempering (REST) ||91|| . In this method, just as in REMUCA, we first perform a short 
REM simulation (with M replicas) to determine the simulated tempering weight factor 
and then perform with this weight factor a regular ST simulation with high statistics. 
The first step is accomplished by the multiple-histogram reweighting techniques |fj. 
which give the dimensionless Helmholtz free energy f m (see Eqs. (|30|) and (|31"D). 

Once the estimate of the dimensionless Helmholtz free energy f m are obtained, the 
simulated tempering weight factor can be directly determined by using Eq. (|25|) where 
we set a m = f m (compare Eqs. (^) and (|3~TD ). A long simulated tempering run is then 
performed with this weight factor. Let N m (E) and n m be respectively the potential-energy 
histogram and the total number of samples obtained at temperature T m = l/k-Q^m from 
this simulated tempering run. The multiple-histogram reweighting techniques of Eqs. ( |30| ) 



and fl3T|) can be used again to obtain the best estimate of the density of states n(E). The 
expectation value of a physical quantity A at any temperature T (= l/k-B/3) is then 
calculated from Eq. (^Op. 

The formulations of REMUCA and REST are simple and straightforward, but the 
numerical improvement is great, because the weight factor determination for MUCA and 
ST becomes very difficult by the usual iterative processes for complex systems. 



2.4 Multicanonical Replica- Exchange Method 

In the previous subsection we presented a new generalized-ensemble algorithm, REMUCA, 
that combines the merits of replica-exchange method and multicanonical algorithm. In 
REMUCA a short REM simulation with M replicas are first performed and the results 
are used to determine the multicanonical weight factor, and then a regular multicanon- 
ical production run with this weight is performed. The number of replicas, M, that is 
required in the first step should be set minimally as long as a random walk between the 
lowest-energy region and the high-energy region is realized. This number can still be very 
large for complex systems. This is why the (multicanonical) production run in REMUCA 
is performed with a "single replica." While multicanonical simulatoins are usually based 
on local updates, a replica-exchange process can be considered to be a global update, and 
global updates enhance the sampling further. Here, we present a further modification of 
REMUCA and refer to the new method as multicanonical replica- exchange method (MU- 
CAREM) [Q. In MUCAREM the final production run is not a regular multicanonical 
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simulation but a replica-exchange simulation with a few replicas, say Ai replicas, in the 
multicanonical ensemble. (We remark that replica-exchange simulations based on the 
generalized ensemble with Tsallis weights were introduced in Ref. ]F(J.) Because multi- 
canonical simulations cover much wider energy ranges than regular canonical simulations, 
the number of required replicas for the production run of MUCAREM is much less than 
that for the regular REM (Ai <C M), and we can keep the merits of REMUCA (and 
improve the sampling further). 

The details of MUCAREM are as follows. As in REMUCA, we first perform a short 
REM simulation with M replicas with M different temperatures (we order them as T\ < 
T 2 < • • • < Tm) and obtain the best estimate of the density of states n(E) in the whole 
energy range of interest (see Eq. (f47[)) by the multiple-histogram reweighting techniques 
of Eqs. fl3~0|) and (pi]) . We then choose a number Ai (Ai <C M) and assign Ai pairs 



of temperatures (T^ , T } 



{m} rp{m}^ 



II 



,Ai). Here, we assume that < T } 



n{m} 



and arrange the temperatures so that the neighboring regions covered by the pairs have 
sufficient overlaps. In particular, we set = T\ and T^ M ^ = Tm- We then define the 
following quantities: 



E 
E 



{m} 
L 

{m} 



< E > {m} 

< E >„{m} 



We also choose Ai (arbitrary) temperatures T„ 
multicanonical potential energies: 



[m 



(m 



Ai) . 



(52) 



Ai) and assign the following 




(E - Et } ) + E mu {Er- T m ) , for E < E[ 



-,{m} 



for El m} 



(E-E 



{m}> 



E mu (4 m} ;T m ) , for E > E 



<E< E^\ 

{m} 



(53) 

where E mu (E; T) is the multicanonical potential energy that was determined for the whole 
energy range of Eq. (|4"T|). As remarked around Eq. (|5*UD, our choice of S^(E) in Eq. 
results in a canonical simulation at T = T, 



{m} 



for E[ m} 



< E < E 



{m} 



for E < El m \ a multicanonical simulation 



and a canonical simulation at T = T, 



{m} 



for E > E^\ 



The production run of MUCAREM is a replica-exchange simulation with Ai replicas 
with A4 different temperatures T m and multicanonical potential energies 8^(E). By fol- 
lowing the same derivation that led to the original REM, we have the following transition 
probability of replica exchange of neighboring temperatures (see Eqs. (H3f) and (P])): 



w 



m+l 



1 , for A < 

exp (-A) , for A > 



(54) 



where 



a = /w {4r 1} (e (?'«)) - 4u +1} (e (^ ] ))}-P m (e ( g w)) - 4- } (e ( g W))} 

(55) 

Note that we need to newly evaluate the multicanonical potential energy, S£ffi(E(qw)) 
and S^ +1 ^(E(q^)), because S^(E) and Sj^(E) are, in general, different functions for 
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m 7^ n. We remark that the same additional evaluation of the potential energy is necessary 
for the multidimensional replica-exchange method pp| . 

For obtaining the canonical distributions, the multiple-histogram reweighting tech- 
niques [0, |3| are again used. Let N m (E) and n m be respectively the potential-energy 
histogram and the total number of samples obtained at T. m with the multicanonical po- 
tential energy E^(E) (m = 1, • ■ ■ , A4). The expectation value of a physical quantity A 
at any temperature T = l/k^fi is then obtained from Eq. (^), where the best estimate 
of the density of states is given by solving the multiple-histogram reweighting equations, 
which now read 

M 

E 9m N m (E) 

<E) = — , (56) 

y m ""rn 

m=l 

and 

e -f m n {E) e - Pm£ ™ {E) . (57) 

E 



3 EXAMPLES OF SIMULATION RESULTS 



We now present some examples of the simulation results by the algorithms described in 
the previous section. A few short peptide systems were considered. 

For Monte Carlo simulations, the potential energy parameters were taken from ECEPP/2 
93|-|95|. The generalized-ensemble algorithms were implemented in the computer code 
KONF90 [96, 97] for the actual simulations. Besides gas phase simulations, various 
solvation models have been incorporated. The simplest one is the sigmoidal, distance- 
dependent dielectric function p8l The explicit form of the function we used is given 



in Ref. |100|| , which is a slight modification of the one in Ref. [ |101|| . A second (and more 
accurate) model that represents solvent contributions is the term proportional to the 
solvent-accessible surface area of solute molecule. The parameters we used are those of 
Ref. [ IU2 |. For the calculation of solvent-accessible surface area, we used the computer 
code NSOL ||103|| , which is based on the code NSC ||104|| . The third (and most rigorous) 
method that represents solvent effects is based on the reference interaction site model 
(RISM) ||105|| - fT07|l . The model of water molecule that we adopted is the SPC/E model 
||108|| . A robust and fast algorithm for solving RISM equations was recently developed 



109 1, which we employed in our calculations. 



For molecular dynamics simulations, the force-field parameters were taken from the all 
atom versions of AMBER |110| |- |TT2| . The computer code developed in Refs. [113, |1 14 



which is based on PRESTO ||1 1 5|| , was used. The unit time step was set to 0.5 fs. The 



temperature during the canonical MD simulations was controlled by the constraint method 



116| , |117|| . Besides gas phase simulations, we have also performed MD simulations with 



explicit water molecules of TIP3P model []1 18 



As described in detail in the previous section, in generalized-ensemble simulations 
and subsequent analyses of the data, potential energy distributions have to be taken as 
histograms. For the bin size of these histograms, we used the values ranging from 0.5 to 
2 kcal/mol, depending on the system studied. 
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We first illustrate how effectively generalized-ensemble simulations can sample the 
configurational space compared to the conventional simulations in the canonical ensem- 
ble. It is known by experiments that the system of a 17- residue peptide fragment from 
ribonuclease Tl tends to form a-helical conformations 



1 19[ 



We have performed both a 
canonical MC simulation of this peptide at a low temperature (T = 200 K) and a multi- 
canonical MC simulation 



120 1 . In Figure 1 we show the time series of potential energy 



from these simulations. 



(a) 



(b) 
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Figure 1: Time series (from 120,000 MC sweeps to 300,000 MC sweeps) of potential 
energy of the peptide fragment of ribonuclease Tl from (a) a conventional canonical MC 
simulation at T = 200 K and (b) a multicanonical MC simulation. 

We see that the canonical simulation thermalize very slowly. On the other hand, the 
MUCA simulation indeed performed a random walk in potential energy space covering a 
very wide energy range. Four conformations chosen during this period (from 120,000 MC 
sweeps to 300,000 MC sweeps) are shown in Figure 2 for the canonical simulation and 
in Figure 3 for the MUCA simulation. We see that the MUCA simulation samples much 
wider conformational space than the conventional canonical simulation. 

The next examples of the systems that we studied by multicanonical MC simulations 
are homo-oligomer systems. We studied the helix-forming tendencies of three amino-acid 
homo-oligomers of length 10 in gas phase |20], |21| and in aqueous solution (the solvent 
effects are represented by the term that is proportional to solvent-accessible surface area) 



42|j . Three characteristic amino acids, alanine (helix former), valine (helix indifferent), 



and glycine (helix breaker) were considered. In Figure 4 the lowest-energy conformations 



obtained both in gas phase and in aqueous solution by MUCA simulations are shown f42 |. 
The lowest-energy conformations of (Ala) in (Figures 2(a) and 2(b)) have six intrachain 
backbone hydrogen bonds that characterize the a-helix and are indeed completely helical. 
Those of (Val) io are also in almost ideal helix state (from residue 2 to residue 9 in gas 
phase and from residue 2 to residue 8 in aqueous solution). On the other hand, those of 
(Gly)io are not helical and rather round. 

We calculated the average values of the total potential energy and its component terms 



of (Ala)io as a function of temperature both in gas phase and in aqueous solution [42]. The 
results are shown in Figure 5. For homo-alanine in gas phase, all the conformational energy 
terms increase monotonically as temperature increases. The changes of each component 
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Figure 2: Typical snapshots from the canonical MC simulation of Figure 1(a). The figures 
were created with Molscript ||121|| and Raster3D ||122| , |123| . 
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(a) I 

{ 






Figure 4: The lowest-energy conformations of (Ala)io (( a ) an d (b)), (Val)io ((c) and (d)), 
and (Gly)io ((e) and (f)) obtained from the multicanonical MC simulations in gas phase 
and in aqueous solution, respectively. The figures were created with Molscript [|121|| and 
Raster3D il22l Il23l 



19 



terms are very small except for the Lennard- Jones term, E v , indicating that E v plays an 
important role in the folding of homo-alanine ET] . 
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Figure 5: Average of the total potential energy E tot and averages of its component terms, 
electrostatic energy E c , hydrogen-bond energy Eh, Lennard- Jones energy E v , torsion en- 
ergy Et, and solvation free energy E so \ (only for the case in aqueous solution) for homo- 
alanine as a function of temperature T (a) in gas phase and (b) in aqueous solution. The 
values for each case were calculated from one multicanonical production run of 1,000,000 
MC sweeps by the single-histogram reweighting techniques. 

In aqueous solution the overall behaviors of the conformational energy terms are very 
similar to those in gas phase. The solvation term, on the other hand, decreases mono- 
tonically as temperature increases. These results imply that the solvation term favors 
random-coil conformations, while the conformational terms favor helical conformations. 

The rapid changes (decrease for the solvation term and increase for the rest of the 
terms) of all the average values occur at the same temperature (around at 420 K in gas 
phase and 340 K in solvent). We thus calculated the specific heat for (Ala)io as a function 
of temperature. The specific heat here is defined by the following equation: 

C(T) = (3 2 <EL >t-< E ^>t\ (58) 

where N (= 10) is the number of residues in the oligomer. In Figure 6 we show the results. 
We observe sharp peaks in the specific heat for both environment. The temperatures at 
the peak, helix-coil transition temperatures, are T c w 420 K and 340 K in gas phase and 
in aqueous solution, respectively. 

We calculated the average number of helical residues < n >t in a conformation as a 
function of temperature. In Figure 7 we show the average helicity < n >t as a function of 
temperature for the three homo-oligomers in aqueous solution. The average helicity tends 
to decrease monotonically as the temperature increases because of the increased thermal 
fluctuations. 

At T = 200 K, < n for homo-alanine is 8. If we neglect the terminal residues, 
in which a-helix tends to be frayed, n = 8 corresponds to the maximal helicity, and 
the conformation can be considered completely helical. The homo-alanine is thus in an 
ideal helical structure at T = 200 K. Around the room temperature, the homo-alanine 
is still substantially helical (~ 70 % helicity). This is consistent with the experimental 
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Figure 6: Specific heat C as a function of temperature T for (Ala)io in gas phase and 
in aqueous solution. The values for each case were calculated from one multicanonical 
production run of 1,000,000 MC sweeps by the single-histogram reweighting techniques. 

fact that alanine is a strong helix former. We observe that < n >t is 5 (50 % helicity) at 
the transition temperature obtained from the peak in specific heat (around 340 K). This 
implies that the peak in specific heat indeed implies a helix-coil transition between an 
ideal helix and a random coil. 

The next example is a penta peptide, Met-enkephalin, whose amino-acid sequence is: 
Tyr-Gly-Gly-Phe-Met. Since this is one of the simplest peptides with biological functions, 
it served as a bench mark system for many simulations. 

Here, we present the latest results of a multicanonical MC simulation of Met-enkephalin 



in gas phase | 3~8fl . The conformations were classified into six groups of similar structures 
according to their intra-chain hydrogen bonds. In Figure 8 we show the lowest-energy 
conformations in each group identified by the MUCA simulation. The lowest-energy con- 
formation of group C25 (Figure 8(a)) has two hydrogen bonds, connecting residues 2 
and 5, and forms a type II' /3-turn. The ECEPP/2 energy of the conformation is —12.2 
kcal/mol, and this conformation corresponds to the global-minimum-energy state of Met- 
enkephalin in gas phase. The lowest-energy conformation of group C14 (Figure 8(b)) has 
two hydrogen bonds, connecting residues 1 and 4, and forms a type II /3-turn. The energy 
is —11.1 kcal/mol, and this conformation corresponds to the second-lowest-energy state. 
Other groups correspond to high-energy states. 

We now study the distributions of conformations in these groups as a function of 
temperature. The results are shown in Figure 9. As can be seen in the Figure, group 
C25 is dominant at low temperatures. Conformations of group C14 start to appear from 
T pa 100 K. At T pa 300 K, the distributions of these two groups, C25 and C14, balance 
( ~ 25 % each) and constitute the main groups. Above T pa 300 K, the contributions of 
other groups become non-negligible (those of group C24 and group C13 are about 10 % 
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Figure 7: Average helicity < n > T as a function of temperature T for (Ala)io, (Val)io, 
and (Gly)io in aqueous solution. The values for each case were calculated from one mul- 
ticanonical production run of 1,000,000 MC sweeps by the single-histogram reweighting 
techniques. 



and 8 %, respectively, at T = 400 K). Note that the distribution of conformations that do 
not belong to any of the six groups monotonically increases as the temperature is raised. 
This is because random-coil conformations without any intrachain hydrogen bonds are 
favored at high temperatures. 

The same peptide in gas phase was studied by the replica-exchange MD simulation 
78]. We made an MD simulation of 2 x 10 6 time steps (or, 1.0 ns) for each replica, 



starting from an extended conformation. We used the following eight temperatures: 700, 
585, 489, 409, 342, 286, 239, and 200 K, which are distributed exponentially, following 
the annealing schedule of simulated annealing simulations [[57J . As is shown below, this 
choice already gave an optimal temperature distribution. The replica exchange was tried 
every 10 fs, and the data were stored just before the replica exchange for later analyses. 

As for expectation values of physical quantities at various temperatures, we used 
the multiple-histogram reweighting techniques of Eqs. (BO) and (3T). We remark that for 
biomolecular systems the integrated autocorrelation times, r m , in the reweighting formulae 
(see Eq. fl3~0D ) can safely be set to be a constant |§, and we do so throughout the analyses 
in this section. 

For an optimal performance of REM simulations the acceptance ratios of replica ex- 
change should be sufficiently uniform and large (say, > 10 %). In Table 1 we list these 
quantities. The values are indeed uniform (all about 15 % of acceptance probability) and 
large enough (more than 10 %). 

The results in Table 1 imply that one should observe a free random walk in temperature 
space. The results for one of the replicas are shown in Figure 10(a). We do observe a 
random walk in temperature space between the lowest and highest temperatures. In 
Figure 10(b) the corresponding time series of the total potential energy is shown. We see 
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Figure 8: The lowest-energy conformations in each group obtained by the multicanonical 
MC simulation of 1,000,000 MC sweeps. The lowest-energy conformations correspond to 
groups (a) C25, (b) C14, (c) C24, (d) C13, (e) C15, and (f) C35. The figures were created 
with RasMol g2|. 
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Figure 9: The distributions of each group of similar strucutres as a function of tempera- 
ture. 



Table 1: Acceptance Ratios of Replica Exchange Corresponding to Pairs of Neighboring 
Temperatures 



Pair of Temperatures (K) Acceptance Ratio 



200 <- 


— > 239 


0.160 


239 <- 


— > 286 


0.149 


286 <- 


— > 342 


0.143 


342 <- 


— * 409 


0.139 


409 «- 


— > 489 


0.142 


489 «- 


— » 585 


0.146 


585 «- 


— > 700 


0.146 
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that a random walk in potential energy space between low and high energies is realized. 
We remark that the potential energy here is that of AMBER in Ref. [ [L10|| . Note that 
there is a strong correlation between the behaviors in Figures 10(a) and 10(b). 
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Figure 10: Time series of (a) temperature exchange and (b) the total potential energy 
for one of the replicas from a replica-exchange MD simulation of Met-enkephalin in gas 
phase. 

In Figure 11 the canonical probability distributions obtained at the chosen eight tem- 
peratures from the replica-exchange simulation are shown. We see that there are enough 
overlaps between all pairs of distributions, indicating that there will be sufficient numbers 
of replica exchanges between pairs of replicas (see Table 1). 

We further compare the results of the replica-exchange simulation with those of a 
single canonical MD simulation (of 1 ns) at the corresponding temperatures. In Figure 12 
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Figure 11: The canonical probability distributions of the total potential energy of Met- 
enkephalin in gas phase obtained from the replica-exchange MD simulation at the eight 
temperatures. The distributions correspond to the following temperatures (from left to 
right): 200, 239, 286, 342, 409, 489, 585, and 700 K. 

we compare the distributions of a pair of dihedral angles (0, ip) of Gly-2 at two extreme 
temperatures (T = 200 K and 700 K). While the results at T = 200 K from the regular 
canonical simulation are localized with only one dominant peak, those from the replica- 
exchange simulation have several peaks (compare Figures 12(a) and 12(b)). Hence, the 
replica-exchange run samples much broader configurational space than the conventional 
canonical run at low temperatures. The results at T = 700 K (Figures 12(c) and 12(d)), 
on the other hand, are similar, implying that a regular canonical simulation can give 
accurate thermodynamic quantities at high temperatures. 

In Figure 13 we show the average total potential energy as a function of temperature. 
As expected from the results of Figure 12, we observe that the canonical simulations at low 
temperatures got trapped in states of energy local minima, resulting in the discrepancies 
in average values between the results from the canonical simulations and those from the 
replica-exchange simulation. 

We now present the results of MD simulations based on replica-exchange multicanon- 
ical algorithm and multicanonical replica-exchange method pD[ . The Met-enkephalin 
in gas phase was studied again. The potential energy is, however, that of AMBER in 
Ref. [p.ll[| instead of Ref. [[TTTJ. In Table 2 we summarize the parameters of the simu- 
lations that were performed. As discussed in the previous section, REMUCA consists 
of two simulations: a short REM simulation (from which the density of states of the 
system, or the multicanonical weight factor, is determined) and a subsequent production 
run of MUCA simulation. The former simulation is referred to as REM1 and the latter 
as MUCAl in Table 2. A production run of MUCAREM simulation is referred to as 
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Figure 12: Distributions of a pair of dihedral angles (</>, ip) of Gly-2 for: (a) T = 200 K 
from a regular canonical MD simulation, (b) T = 200 K from the replica-exchange MD 
simulation, (c) T = 700 K from a regular canonical MD simulation, and (d) T = 700 K 
from the replica-exchange MD simulation. 
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Figure 13: Average total potential energy of Met-enkephalin in gas phase as a function of 
temperature. The solid curve is the result from the replica-exchange MD simulation and 
the dots are those of regular canonical MD simulations. 
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MUCAREM1 in Table 2, and it uses the same density of states that was obtained from 
REM1. Finally, a production run of the original REM simulation was also performed for 
comparison and it is referred to as REM2 in Table 2. The total simulation time for the 
three production runs (REM2, MUCAl, and MUCAREM1) was all set equal (i.e., 5 ns). 

Table 2: Summary of Parameters in REM, REMUCA, and MUCAREM Simulations 



Run No. of Replicas, M Temperature, T m (K) (m = 1, • • ■ , M) MD Steps 



REM1 


10 


200, 239, 286, 342, 409, 
489, 585, 700, 836, 1000 


2 x 10 5 


REM2 


10 


200, 239, 286, 342, 409, 
489, 585, 700, 836, 1000 


1 x 10 6 


MUCAl 


1 


1000 


1 x 10 7 


MUCAREM1 


4 


375, 525, 725, 1000 


2.5 x 10 6 



After the simulation of REM1 is finished, we obtained the density of states, n(E), 
by the multiple-histogram reweighting techniques of Eqs. ( p0|) and (|3T|). The density of 
states will give the average values of the potential energy from Eq. fl20D , and we found 




= < E >z\= —30 kcal/mol , , , 

= < E > Tm = 195 kcal/mol . ^ ' 

Then our estimate of the density of states is reliable in the range E\ < E < Em- The 
multicanonical potential energy £^ (E) was thus determined for the three energy regions 
(E < Ei, Ei < E < Em, and E > Em) from Eq. fl49j). Namely, the multicanonical 
potential energy, E mu (E; T ), in Eq. @ and its derivative, dEm ^> T o) ^ were determined by 
fitting lnn(E) by cubic spline functions in the energy region of (—30 < E < 195 kcal/mol) 



90| . Here, we have set the arbitrary reference temperature to be To = 1000 K. Outside 
this energy region, E mu (E;T ) was linearly extrapolated as in Eq. ([49]) . 

After determining the multicanonical weight factor, we carried out a multicanonical 
MD simulation of 1 x 10 7 steps (or 5 ns) for data collection (MUCAl in Table 2). In 
Figure 14 the probability distribution obtained by MUCAl is plotted. It can be seen that 
a good flat distribution is obtained in the energy region Ei < E < Em- In Figure 14 
the canonical probability distributions that were obtained by the reweighting techniques 
at T = Ti — 200 K and T = Tm = 1000 K are also shown (these results are essentially 
identical to one another among MUCAl, MUCAREM1, and REM2, as discussed below). 
Comparing these curves with those of MUCAl in the energy regions E < E\ and E > Em 
in Figure 14, we confirm our claim in the previous section that MUCAl gives canonical 
distributions at T = Ti for E < E\ and at T = Tm for E > Em, whereas it gives a 
multicanonical distribution for Ei < E < Em- 

In the previous works of multicanonical simulations of Met-enkephalin in gas phase 



(see, for instance, Refs. [|14], |38|), at least several iterations of trial simulations were re- 
quired for the multicanonical weight determination. We emphasize that in the present 
case of REMUCA (REM1), only one simulation was necessary to determine the optimal 
multicanonical weight factor that can cover the energy region corresponding to tempera- 
tures between 200 K and 1000 K. 
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Figure 14: Probability distribution of potential energy of Met-enkephalin in gas phase 
that was obtained from MUCAl (see Table 2). The dotted curves are the probability 
distributions of the reweighted canonical ensemble at T = 200 K (left) and 1000 K (right). 

From the density of states obtained by REMUCA (i.e., REM1), we prepared the mul- 
ticanonical weight factors (or the multicanonical potential energies) for the MUCAREM 
simulation (see Eq. ([53]) ). The parameters of MUCAREM1, such as energy bounds 
and E { h } (m = l,---,M) are listed in Table 3. The choices of ^ and are, in 

general, arbitrary, but significant overlaps between the probability distributions of adja- 
cent replicas are necessary. The replica-exchange process in MUCAREM1 was tried every 
200 time steps (or 100 fs). It is less frequent than in REM1 (or REM2). This is because 
we wanted to ensure a sufficient time for system relaxation. 



Table 3: Summary of Parameters in MUCAREM1 



m 


T {m} ^ 


T { H m] (K) 


T m (K) 


£| m| (kcal/mol) 


E { H nl (kcal/mol) 


1 


200 


375 


375 


-30 


20 


2 


300 


525 


525 


-5 


65 


3 


375 


725 


725 


20 


120 


4 


525 


1000 


1000 


65 


195 



In Figure 15 the probability distributions of potential energy obtained by MUCAREM1 
are shown. As expected, we observe that the probability distributions corresponding to 
the temperature T m are essentially flat for the energy region Ej™^ < E < E^, are of the 
canonical simulation at T = for E < E]™\ and are of the canonical simulation at 
T = Tjj m} for E > E { h i} (m = l,---,M). As a result, each distribution in MUCAREM is 
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much broader than those in the conventional REM and a much smaller number of replicas 
are required in MUCAREM than in REM (M = 4 in MUCAREM versus M = 10 in 
REM). 
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Figure 15: Probability distributions of potential energy obtained from MUCAREM1 (see 
Tables 2 and 3). 

In Figure 16 the time series of potential energy for the first 500 ps of REM2 (a), 
MUCA1 (b), and MUCAREM 1 (c) are plotted. They all exhibit a random walk in po- 
tential energy space, implying that they all perfomed properly as generalized-ensemble 
algorithms. To check the validity of the canonical-ensemble expectation values calculated 
by the new algorithms, we compare the average potential energy as a function of tempera- 
ture in Figure 17. In REM2 and MUCAREM1 we used the multiple-histogram techniques 
|2]. [3|], whereas the single-histogram method |IJ was used in MUCAl. We can see a perfect 
coincidence of these quantities among REM2, MUCAl, and MUCAREM1 in Figure 17. 

We now present the results of a replica-exchange simulated tempering MC simulation 
of Met-enkephalin in gas phase [91]. The potential energy is again that of ECEPP/2 |P3|| - 



In Table 4 we summarize the parameters of the simulations that were performed. 
As described in the previous section, REST consists of two simulations: a short REM 
simulation (from which the dimensionless Helmholtz free energy, or the simulated tem- 
pering weight factor, is determined) and a subsequent ST production run. The former 
simulation is referred to as REM1 and the latter as ST1 in Table 4. In REM1 there exist 
8 replicas with 8 different temperatures (M = 8), ranging from 50 K to 1000 K as listed 
in Table 4 (i.e., T\ = 50 K and T M = T 8 = 1000 K). The same set of temperatures were 
also used in ST1. The temperatures were distributed exponentially between T\ and T M , 



following the optimal distribution found in the previous simulated annealing schedule p? 
simulated tempering run f5l| , and replica-exchange simulation |f78"| . After estimating the 
weight factor, we made a ST production run of 10 6 MC sweeps (ST1). In REM1 and 
ST1, a replica exchange and a temperature update, respectively, were tried every 10 MC 
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Figure 16: Time series of potential energy of Met-enkephalin in gas phase for one of the 
replicas in (a) REM2, (b) MUCAl, and (c) MUCAREM1 (see Tables 2 and 3 for the 
parameters of the simulations). 
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Figure 17: The average potential energy of Met-enkephalin in gas phase as a function of 
temperature. The solid, dotted, and dashed curves are obtained from REM2, MUCAl, 
and MUCAREM1, respectively (see Tables 2 and 3 for the parameters of the simulations). 



sweeps. 



Table 4: Summary of Parameters in REST Simulations 



Run 


No. of Replicas, M 


Temperature, T m (K) (m = 


= 1,- 


•,M) 


MC Sweeps 


REM1 


8 


50, 77, 118, 181, 277, 425, 


652, 


1000 


5 x 10 4 


ST1 


1 


50, 77, 118, 181, 277, 425, 


652, 


1000 


1 x 10 6 



We first check whether the replica-exchange simulation of REM1 indeed performed 
properly. For an optimal performance of REM the acceptance ratios of replica exchange 
should be sufficiently uniform and large (say, > 10%). In Table 5 we list these quantities. 
It is clear that both points are met in the sense that they are of the same order (the values 
vary between 10 % and 40 %). 

After determining the simulated tempering weight factor, we carried out a long ST 
simulation for data collection (ST1 in Table 4). In Figure 18 the time series of temperature 
and potential energy from ST1 are plotted. In Figure 18(a) we observe a random walk 
in temperature space between the lowest and highest temperatures. In Figure 18(b) the 
corresponding random walk of the total potential energy between low and high energies 
is observed. Note that there is a strong correlation between the behaviors in Figures 
18(a) and 18(b), as there should. It is known from our previous works that the global- 
minimum-energy conformation for Met-enkephalin in gas phase has the ECEPP/2 energy 
value of —12.2 kcal/mol Hence, the random walk in Figure 12(b) indeed visited 

the global- minimum region many times. It also visited high energy regions, judging from 
the fact that the average potential energy is around 15 kcal/mol at T = 1000 K [TJ], 
(see also Figure 19 below). 

For an optimal performance of ST, the acceptance ratios of temperature update should 
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Table 5: Acceptance Ratios of Replica Exchange in REM1 of Table 4 



Pair of Temperatures (K) Acceptance Ratio 



50 <- 


-> 77 


0.30 


77 <- 


-> 118 


0.27 


118 <- 


-> 181 


0.22 


181 «- 


-» 277 


0.17 


277 «- 


-» 425 


0.10 


425 «- 


-» 652 


0.27 


652 «- 


-> 1000 


0.40 
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Figure 18: Time series of (a) temperature and (b) potential energy in ST1 (see Table 4 
for the parameters of the simulation). 
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be sufficiently uniform and large. In Table 6 we list these quantities. It is clear that both 
points are met (the values vary between 26 % and 57 %); we find that the present ST run 
(ST1) indeed properly performed. We remark that the acceptance ratios in Table 6 are 
significantly larger and more uniform than those in Table 5, suggesting that ST runs can 
sample the configurational space more effectively than REM runs, provided the optimal 
weight factor is obtained. 

Table 6: Acceptance Ratios of Temperature Update in ST1 
Pair of Temperatures (K) Acceptance Ratio 



50 — ► 77 0.47 

77 — ► 50 0.47 

77 — ► 118 0.43 

118 — ► 77 0.43 

118 — ► 181 0.37 

181 — ► 118 0.42 

181 — ► 277 0.29 

277 — ► 181 0.29 

277 — ► 425 0.30 

425 — ► 277 0.26 

425 — ► 652 0.43 

652 — ► 425 0.42 

652 -> 1000 0.57 

1000 — ► 652 0.56 



We remark that the details of Monte Carlo versions of REMUCA and MUCAREM 



have also been worked out and tested with Met-enkephalin in gas phase [125]. Here in Fig- 



ure 19, we just show the average ECEPP/2 potential energy as a function of temperature 
that was calculated from the four generalized-ensemble algorithms, MUCA, REMUCA, 
MUCAREM, and REST ||125|| . The results are in good agreement. 

We have so far presented the results of generalized-ensemble simulations of Met- 
enkephalin in gas phase. However, peptides and proteins are usually in aqueous solution. 
We therefore want to incorporate rigorous solvation effects in our simulations in order to 
compare with experiments. 

Our first example with rigorous solvent effects is a multicanonical MC simulation, 
where the solvation term was included by the RISM theory |44 |. While low-energy con- 



formations of Met-enkephalin in gas phase are compact and form /3-turn structures |38 
it turned out that those in aqueous solution are extended. In Figure 20 we show the 
lowest-energy conformations of Met-enkephalin obtained during the multicanonical MC 
simulation with RISM theory incorporated [Q. They exhibit characteristics of almost 
fully extended backbone structure with large side-chain fluctuations. The results are in 
accord with the observations in NMR experiments, which also suggest extended confor- 



mations [126]. 



We also calculated an average of the end-to-end distance of Met-enkephalin as a func- 
tion of temperature. The results in aqueous solution (the present simulation) and in the 
gas phase (a previous simulation |38| ) are compared in Figure 21. The end-to-end distance 
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Figure 19: The average potential energy of Met-enkephalin in gas phase as a function 
of temperature. The results from the four generalized-ensemble algorithms, MUCA, RE- 
MUCA, MUCAREM, and REST, are superimposed. 




Figure 20: Superposition of eight representative low-energy conformations of Met- 
enkephalin obtained by the multicanonical MC simulation in aqueous solution based on 
RISM. The figure was created with RasMol 
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in aqueous solution at all temperatures varies little (around 12 A); the conformations are 
extended in the entire temperature range. On the other hand, in the gas phase, the 
end-to-end distance is small at low temperatures due to intrachain hydrogen bonds, while 
the distance is large at high temperatures, because these intrachain hydrogen bonds are 
broken. 
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Figure 21: Average end-to-end distance of Met-enkephalin in aqueous solution (SOL) and 
in gas phase (GAS) as a function of temperature. Here, the end-to-end distance is defined 
as the distance between the nitrogen atom at the N terminus and the oxygen atom at the 
C terminus. 



The same peptide was also studied by MD simulations of replica-exchange and other 
generalized-ensemble simulations in aqueous solution based on TIP3P water model |127|. 
Two AMBER force fields ||1 1 1| , |112|| were used. The number of water molecules was 526 
and they were placed in a sphere of radius of 16 A. The initial configuration is shown in 
Figure 22. 

In Figure 23 the canonical probability distributions obtained at the 24 temperatures 
from the replica-exchange simulation are shown. We see that there are enough overlaps 
between all pairs of distributions, indicating that there will be sufficient numbers of replica 
exchanges between pairs of replicas. The corresponding time series of the total potential 
energy for one of the replicas is shown in Figure 24. We do observe a random walk in 
potential energy space, which covers an energy range of as much as 2,000 kcal/mol. 

Finally, the average end-to-end distance as a function of temperature was calculated 
by the multiple-histogram reweighting techniques of Eqs. ( |3"U[ ) and (^T|). The results both 
in gas phase and in aqueous solution are shown in Figure 25. The results are in good 
agreement with those of ECEPP/2 energy plus RISM solvation theory [54 in the sense 
that Met-enkephalin is compact at low temperatures and extended at high temperatures 
in gas phase and extended in the entire temperature range in aqueous solution (compare 
Figures 21 and 25). 
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Figure 22: Initial configuration of replica-exchange MD simulations of Met-enkephalin in 
aqueous solution with 526 TIP3P water molecules. 
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Figure 23: The canonical probability distributions of the total potential energy of Met- 
enkephalin in aqueous solution obtained from the replica-exchange MD simulation at the 
24 temperatures. 
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Figure 24: Time series of the total potential energy of Met-enkephalin in aqueous solution 
obtained for one of the replicas from the replica-exchange MD simulation. Corresponding 
times series in the canonical ensemble at temperatures 250 K and 500 K are also shown. 




Figure 25: Average end-to-end distance of Met-enkephalin (a) in gas phase and (b) in 
aqueous solution as a function of temperature. 
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4 CONCLUSIONS 



In this article we have reviewed uses of generalized-ensemble algorithms in molecular 
simulations of biomolecules. A simulation in generalized ensemble realizes a random walk 
in potential energy space, alleviating the multiple-minima problem that is a common 
difficulty in simulations of complex systems with many degrees of freedom. 

Detailed formulations of the three well-known generalized-ensemble algorithms, namely, 
multicaonical algorithm (MUCA), simulated tempering (ST), and replica-exchange method 
(REM), were given. We then introduced three new generalized-ensemble algorithms that 
combine the merits of the above three methods, which we refer to as replica-exchange mul- 
ticanonical algorithm (REMUCA), replica-exchange simulated tempering (REST), and 
multicanonical replica-exchange method (MUCAREM). 

With these new methods available, we believe that we now have working simulation 
algorithms which we can use for conformational predictions of peptides and proteins 
from the first principles, using the information of their amino-acid sequence only. It is 
now high time that we addressed the question of the validity of the standard potential 
energy functions such as AMBER, CHARMM, GROMOS, ECEPP, etc. For this purpose, 
conventional simulations in the canonical ensemble are of little use because they will 
necessarily get trapped in states of local-minmum-energy states. It is therefore essential 
to use generalized-ensemble algorithms in order to test and develop accurate potential 
energy functions for biomolecular systems. Some preliminary results of comparisons of 
different versions of AMBER force fields were given in the present article. We remark 
that more detailed analyses that compare different versions of AMBER by multicanonical 
MD simulations already exist ||128|| . Likewise, the validity of solvation theories should also 



be tested. For this, RISM theory ||105 |-[ l07| can be very useful. For instance, we have 



successfully given a molecular mechanism of secondary structural transitions in peptides 
due to addition of alcohol to solvent [ 129| , which is very difficult to attain by regular 



molecular simulations. 
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